-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement lazy join #1524
base: master
Are you sure you want to change the base?
Implement lazy join #1524
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1524 +/- ##
==========================================
+ Coverage 88.95% 88.99% +0.03%
==========================================
Files 366 366
Lines 33256 33644 +388
Branches 3729 3774 +45
==========================================
+ Hits 29582 29940 +358
+ Misses 2440 2433 -7
- Partials 1234 1271 +37 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first round of reviews, let's discuss the complexity of this, which somewhat surprises me
(I already think my original code is too complex).
src/engine/Join.cpp
Outdated
for ([[maybe_unused]] std::monostate& _ : generator) { | ||
// Generator should never yield, just consume everything in one go. | ||
AD_FAIL(); | ||
} | ||
auto result = std::move(*rowAdder).resultTable(); | ||
result.setColumnSubset(joinColMap.permutationResult()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a function materializedResultFromLazyInput(std::move(generator), std::move(rowAdder), joinColMap)
// Allows the parameters to be null, which causes them to be ignored. | ||
static LocalVocab mergeVocabsIfNecessary( | ||
const std::shared_ptr<const Result>& result1, | ||
const std::shared_ptr<const Result>& result2); | ||
|
||
template <typename T> | ||
ProtoResult monostateGeneratorToResult( | ||
bool requestedLaziness, cppcoro::generator<std::monostate> generator, | ||
std::shared_ptr<const Result> a, std::shared_ptr<const Result> b, | ||
T rowAdder, | ||
ad_utility::InvocableWithExactReturnType<IdTable, T&> auto extractTable, | ||
std::invocable auto postAction) const; | ||
|
||
static bool couldContainUndef(const auto& blocks, const auto& tree, | ||
ColumnIndex joinColumn); | ||
|
||
ProtoResult lazyJoin(std::shared_ptr<const Result> a, ColumnIndex jc1, | ||
std::shared_ptr<const Result> b, ColumnIndex jc2, | ||
bool requestLaziness) const; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some documentation, what do those functions do?
if (!result.empty()) { | ||
co_yield result; | ||
} | ||
*localVocab = mergeVocabsIfNecessary(a, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simply merge the local vocabs upfront, and don't need to have them inside the inner generator lambda (the shared pointer magic makes all these things work.
return convertGenerator(std::move(blockOrBlocks)); | ||
} else { | ||
static_assert(ad_utility::alwaysFalse<T>, "Unexpected type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { static_assert(std::ranges::range<t>, "message) ... }
Also is range
the correct concept, don't you explicitly expect generator
for the convertGenerator
function?
auto begL = fullBlockLeft.get().begin(); | ||
auto begR = fullBlockRight.get().begin(); | ||
auto addRowIndex = [&begL, &begR, this](auto itFromL, auto itFromR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this change?
return [this, begL = fullBlockLeft.get().begin(), | ||
&subrangeRight](auto itFromL) { | ||
if constexpr (potentiallyHasUndef) { | ||
AD_CORRECTNESS_CHECK(!isUndefined_(subrangeRight.front()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this might be in a somewhat tight loop, precompute the condition, and check after the handling of the block or what not else, that the lambda wasn't called etc. or just make this check AD_EXPENSIVE_CHECK
if constexpr (potentiallyHasUndef) { | ||
AD_CORRECTNESS_CHECK(rightSide_.undefBlocks_.empty()); | ||
} | ||
getNextBlocks(equalToCurrentElLeft, leftSide_, rightSide_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These Checks are also related to the part of the code i don't understand yet.
while (side.it_ != side.end_) { | ||
const auto& lBlock = *side.it_; | ||
for (const auto& rBlock : undefBlocks) { | ||
if constexpr (reversed) { | ||
compatibleRowAction_.setInput(rBlock.fullBlock(), lBlock); | ||
} else { | ||
compatibleRowAction_.setInput(lBlock, rBlock.fullBlock()); | ||
} | ||
for (size_t i : ad_utility::integerRange(lBlock.size())) { | ||
for (size_t j : rBlock.getIndexRange()) { | ||
hasUndef = true; | ||
if constexpr (reversed) { | ||
compatibleRowAction_.addRow(j, i); | ||
} else { | ||
compatibleRowAction_.addRow(i, j); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really duplicated to what you did for the cartesian thing above, can something be filtered out here?
} | ||
} | ||
++side.it_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am stopping the review for now.
I am not yet fully sure why the undef support takes three times the code of the old version.
Maybe we can use a design similar to the fully materialized zipper join for bla with undef where the undef handling is completely templated out.
Conformance check passed ✅No test result changes. |
Quality Gate failedFailed conditions See analysis details on SonarCloud Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
This PR adds support for lazily computed join operations.